Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data. Basically, we design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores. We examine a range of recently proposed evaluation metrics based on pretrained language models, for the tasks of open-ended generation, translation, and summarization. Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics. For example, we find that BERTScore ignores truncation errors in summarization, and MAUVE (built on top of GPT-2) is insensitive to errors at the beginning of generations. Further, we investigate the reasons behind these blind spots and suggest practical workarounds for a more reliable evaluation of text generation.
translated by 谷歌翻译
仔细的音频表示形式已成为许多语音任务方法设计的主要特征。这种方法越来越强调“解开”,其中表示形式仅包含与转录相关的一部分,同时丢弃无关信息。在本文中,我们基于ASR和TTS的联合建模构建了一项表示的学习任务,并试图学习音频的表示,该声音信号的一部分与该部分相关的一部分与该部分相关。我们提供了经验证据,表明成功找到这种表示形式与训练中固有的随机性有关。然后,我们观察到这些所需的,分散的解决方案对优化问题具有独特的统计特性。最后,我们表明,在训练期间执行这些特性会使我们的联合建模任务平均相对24.5%。这些观察结果激发了一种新颖的学习有效音频表示的方法。
translated by 谷歌翻译
了解机器学习模型如何推广到新环境是其安全部署的关键部分。最近的工作提出了各种复杂性度量,这些度量直接预测或理论上结合了模型的概括能力。但是,这些方法依赖于在实践中并不总是满足的一系列强有力的假设。受到有限的设置,可以采用现有措施的有限设置,我们提出了一种基于分类器的局部歧管平滑度的新颖复杂度度量。我们将局部歧管平滑度定义为分类器对给定测试点周围歧管社区中扰动的输出敏感性。直觉上,对这些扰动不太敏感的分类器应更好地概括。为了估计平滑度,我们使用数据扩展进行采样点,并测量分类为多数类的这些点的分数。我们的方法仅需要选择数据增强方法,并且对模型或数据分布没有其他假设,这意味着即使在现有方法无法使用的情况下,也可以在室外(OOD)设置中应用。在图像分类,情感分析和自然语言推断中的鲁棒性基准的实验中,我们证明了我们在100多个火车/测试域对上评估的超过3,000个模型上的流形光滑度量与实际的OOD概括之间存在很强而牢固的相关性。
translated by 谷歌翻译
我们提出了一种方法,通过将知识存储在外部知识图(kg)中,并使用密集的索引从该kg中检索,使自然语言理解模型更有效地有效。给定(可能是多语言的)下游任务数据,例如德语中的句子,我们从kg中检索实体,并使用其多模式表示形式来改善下游任务绩效。我们使用最近发布的VisualSem KG作为我们的外部知识存储库,涵盖了Wikipedia和WordNet实体的子集,并比较基于元组和基于图的算法的混合,以学习基于KG多模式信息的实体和关系表示。 。我们在两个下游任务上展示了学识渊博的实体表示形式的有用性,并在多语言命名实体识别任务上的性能提高了$ 0.3 \%$ - $ 0.7 \%\%$ f1,而我们的准确度最高为$ 2.5 \%\%$ $提高。在视觉意义上的歧义任务上。我们所有的代码和数据都提供:\ url {https://github.com/iacercalixto/visualsem-kg}。
translated by 谷歌翻译
在模式连通性文献中被广泛接受的是,当两个神经网络在相同的数据上类似地训练时,它们通过路径通过参数空间连接,维持了测试集精度。在某些情况下,包括从预验证的模型中转移学习,这些路径被认为是线性的。与现有结果相反,我们发现在文本分类器(在MNLI,QQP和COLA上训练)中,一些填充模型具有较大的障碍,它们之间的线性路径之间的损失越来越大。在每个任务上,我们都会发现模型的不同簇,这些模型簇在测试损失表面上是线性连接的,但与集群外部的模型断开 - 模型占据了表面上的单独盆地。通过测量专门制作的诊断数据集的性能,我们发现这些簇对应于不同的概括策略:一个群集的行为就像域移动下的一袋单词模型一样,而另一个群集使用句法启发式方法。我们的工作表明,损耗表面的几何形状如何指导模型朝着不同的启发式函数。
translated by 谷歌翻译
The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.
translated by 谷歌翻译
We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.
translated by 谷歌翻译
我们假设,由于多模式深神经网络中学习的贪婪本质,这些模型倾向于仅依靠一种模式,同时又构成了其他模式。这种行为是违反直觉的,并损害了模型的概括,正如我们从经验上观察到的。为了估计模型对每种模式的依赖性,我们计算模型还可以访问其他模式时的准确性增益。我们将此增益称为条件利用率。在实验中,我们始终观察到模式之间,多个任务和体系结构之间的条件利用率不平衡。由于在训练过程中无法有效地计算条件利用率,因此我们根据模型从每种模式中学习的速度引入代理,我们将其称为条件学习速度。我们提出了一种算法,以平衡训练过程中模式之间的有条件学习速度,并证明它确实解决了贪婪学习的问题。提出的算法改善了模型在三个数据集上的概括:彩色MNIST,ModelNet40和Nvidia Dynamic Hand手势。
translated by 谷歌翻译
尽管在数据增强中的混合成功,但由于自然语言的离散和可变性质,因此对自然语言处理(NLP)任务的适用性受到限制。因此,最近的研究依赖于域的特定启发式和手动制作的资源,例如词典,以便在NLP中应用混合。在本文中,我们为数据增强的目的提出了一种无监督的学习方法,以便为数据增强而言,我们将作为“学习用于数据增强”(LINDA),这不需要任何启发式或手动制作资源但学习通过自然语言歧管在任何一对自然语言句子之间插入。在经验展示Linda的插值能力之后,我们表明Linda确实允许我们在NLP中无缝地应用混合,并导致文本分类中的更好的概括和域名。
translated by 谷歌翻译